Date Source Site ID POC Daily Mean PM2.5 Concentration Units
<char> <char> <int> <int> <num> <char>
1: 01/05/2002 AQS 60010007 1 25.1 ug/m3 LC
2: 01/06/2002 AQS 60010007 1 31.6 ug/m3 LC
3: 01/08/2002 AQS 60010007 1 21.4 ug/m3 LC
4: 01/11/2002 AQS 60010007 1 25.9 ug/m3 LC
5: 01/14/2002 AQS 60010007 1 34.5 ug/m3 LC
6: 01/17/2002 AQS 60010007 1 41.0 ug/m3 LC
Daily AQI Value Local Site Name Daily Obs Count Percent Complete
<int> <char> <int> <num>
1: 81 Livermore 1 100
2: 93 Livermore 1 100
3: 74 Livermore 1 100
4: 82 Livermore 1 100
5: 98 Livermore 1 100
6: 115 Livermore 1 100
AQS Parameter Code AQS Parameter Description Method Code
<int> <char> <int>
1: 88101 PM2.5 - Local Conditions 120
2: 88101 PM2.5 - Local Conditions 120
3: 88101 PM2.5 - Local Conditions 120
4: 88101 PM2.5 - Local Conditions 120
5: 88101 PM2.5 - Local Conditions 120
6: 88101 PM2.5 - Local Conditions 120
Method Description CBSA Code
<char> <int>
1: Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
2: Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
3: Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
4: Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
5: Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
6: Andersen RAAS2.5-300 PM2.5 SEQ w/WINS 41860
CBSA Name State FIPS Code State
<char> <int> <char>
1: San Francisco-Oakland-Hayward, CA 6 California
2: San Francisco-Oakland-Hayward, CA 6 California
3: San Francisco-Oakland-Hayward, CA 6 California
4: San Francisco-Oakland-Hayward, CA 6 California
5: San Francisco-Oakland-Hayward, CA 6 California
6: San Francisco-Oakland-Hayward, CA 6 California
County FIPS Code County Site Latitude Site Longitude
<int> <char> <num> <num>
1: 1 Alameda 37.68753 -121.7842
2: 1 Alameda 37.68753 -121.7842
3: 1 Alameda 37.68753 -121.7842
4: 1 Alameda 37.68753 -121.7842
5: 1 Alameda 37.68753 -121.7842
6: 1 Alameda 37.68753 -121.7842
tail(Ca2002)
Date Source Site ID POC Daily Mean PM2.5 Concentration Units
<char> <char> <int> <int> <num> <char>
1: 12/10/2002 AQS 61131003 1 15 ug/m3 LC
2: 12/13/2002 AQS 61131003 1 15 ug/m3 LC
3: 12/22/2002 AQS 61131003 1 1 ug/m3 LC
4: 12/25/2002 AQS 61131003 1 23 ug/m3 LC
5: 12/28/2002 AQS 61131003 1 5 ug/m3 LC
6: 12/31/2002 AQS 61131003 1 6 ug/m3 LC
Daily AQI Value Local Site Name Daily Obs Count Percent Complete
<int> <char> <int> <num>
1: 62 Woodland-Gibson Road 1 100
2: 62 Woodland-Gibson Road 1 100
3: 6 Woodland-Gibson Road 1 100
4: 77 Woodland-Gibson Road 1 100
5: 28 Woodland-Gibson Road 1 100
6: 33 Woodland-Gibson Road 1 100
AQS Parameter Code AQS Parameter Description Method Code
<int> <char> <int>
1: 88101 PM2.5 - Local Conditions 117
2: 88101 PM2.5 - Local Conditions 117
3: 88101 PM2.5 - Local Conditions 117
4: 88101 PM2.5 - Local Conditions 117
5: 88101 PM2.5 - Local Conditions 117
6: 88101 PM2.5 - Local Conditions 117
Method Description CBSA Code
<char> <int>
1: R & P Model 2000 PM2.5 Sampler w/WINS 40900
2: R & P Model 2000 PM2.5 Sampler w/WINS 40900
3: R & P Model 2000 PM2.5 Sampler w/WINS 40900
4: R & P Model 2000 PM2.5 Sampler w/WINS 40900
5: R & P Model 2000 PM2.5 Sampler w/WINS 40900
6: R & P Model 2000 PM2.5 Sampler w/WINS 40900
CBSA Name State FIPS Code State
<char> <int> <char>
1: Sacramento--Roseville--Arden-Arcade, CA 6 California
2: Sacramento--Roseville--Arden-Arcade, CA 6 California
3: Sacramento--Roseville--Arden-Arcade, CA 6 California
4: Sacramento--Roseville--Arden-Arcade, CA 6 California
5: Sacramento--Roseville--Arden-Arcade, CA 6 California
6: Sacramento--Roseville--Arden-Arcade, CA 6 California
County FIPS Code County Site Latitude Site Longitude
<int> <char> <num> <num>
1: 113 Yolo 38.66121 -121.7327
2: 113 Yolo 38.66121 -121.7327
3: 113 Yolo 38.66121 -121.7327
4: 113 Yolo 38.66121 -121.7327
5: 113 Yolo 38.66121 -121.7327
6: 113 Yolo 38.66121 -121.7327
Examine the variable names and variable types.
str(Ca2002)
Classes 'data.table' and 'data.frame': 15976 obs. of 22 variables:
$ Date : chr "01/05/2002" "01/06/2002" "01/08/2002" "01/11/2002" ...
$ Source : chr "AQS" "AQS" "AQS" "AQS" ...
$ Site ID : int 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 ...
$ POC : int 1 1 1 1 1 1 1 1 1 1 ...
$ Daily Mean PM2.5 Concentration: num 25.1 31.6 21.4 25.9 34.5 41 29.3 15 18.8 37.9 ...
$ Units : chr "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" ...
$ Daily AQI Value : int 81 93 74 82 98 115 89 62 69 107 ...
$ Local Site Name : chr "Livermore" "Livermore" "Livermore" "Livermore" ...
$ Daily Obs Count : int 1 1 1 1 1 1 1 1 1 1 ...
$ Percent Complete : num 100 100 100 100 100 100 100 100 100 100 ...
$ AQS Parameter Code : int 88101 88101 88101 88101 88101 88101 88101 88101 88101 88101 ...
$ AQS Parameter Description : chr "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" ...
$ Method Code : int 120 120 120 120 120 120 120 120 120 120 ...
$ Method Description : chr "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" "Andersen RAAS2.5-300 PM2.5 SEQ w/WINS" ...
$ CBSA Code : int 41860 41860 41860 41860 41860 41860 41860 41860 41860 41860 ...
$ CBSA Name : chr "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" ...
$ State FIPS Code : int 6 6 6 6 6 6 6 6 6 6 ...
$ State : chr "California" "California" "California" "California" ...
$ County FIPS Code : int 1 1 1 1 1 1 1 1 1 1 ...
$ County : chr "Alameda" "Alameda" "Alameda" "Alameda" ...
$ Site Latitude : num 37.7 37.7 37.7 37.7 37.7 ...
$ Site Longitude : num -122 -122 -122 -122 -122 ...
- attr(*, ".internal.selfref")=<externalptr>
summary(Ca2002$`Daily Mean PM2.5 Concentration`)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 7.00 12.00 16.12 20.50 104.30
sum(is.na(Ca2002$`Daily Mean PM2.5 Concentration`))
[1] 0
sum(Ca2002$`Daily Mean PM2.5 Concentration`=="")
[1] 0
hist(Ca2002$`Daily Mean PM2.5 Concentration`)
Summary of the PM 2.5 California 2000 Dataset
In the 2002 dataset of daily average PM2.5 concentrations at all sites in California, there are 15,976 rows and 22 columns. There are 15,976 observations and 22 variables. There are no missing values for the daily mean PM2.5 concentration. There are no observations that are labeled as NA, ““, 999, or 9999. For the daily mean PM2.5 concentration, the range is 0-104.30 ug/m^3, which is plausible. There appears to be no major issues with the data.
Checking the PM2.5 California 2022 Dataset
Check the size of the data.
dim(Ca2022)
[1] 59756 22
Look at the top and bottom of the data.
head(Ca2022)
Date Source Site ID POC Daily Mean PM2.5 Concentration Units
<char> <char> <int> <int> <num> <char>
1: 01/01/2022 AQS 60010007 3 12.7 ug/m3 LC
2: 01/02/2022 AQS 60010007 3 13.9 ug/m3 LC
3: 01/03/2022 AQS 60010007 3 7.1 ug/m3 LC
4: 01/04/2022 AQS 60010007 3 3.7 ug/m3 LC
5: 01/05/2022 AQS 60010007 3 4.2 ug/m3 LC
6: 01/06/2022 AQS 60010007 3 3.8 ug/m3 LC
Daily AQI Value Local Site Name Daily Obs Count Percent Complete
<int> <char> <int> <num>
1: 58 Livermore 1 100
2: 60 Livermore 1 100
3: 39 Livermore 1 100
4: 21 Livermore 1 100
5: 23 Livermore 1 100
6: 21 Livermore 1 100
AQS Parameter Code AQS Parameter Description Method Code
<int> <char> <int>
1: 88101 PM2.5 - Local Conditions 170
2: 88101 PM2.5 - Local Conditions 170
3: 88101 PM2.5 - Local Conditions 170
4: 88101 PM2.5 - Local Conditions 170
5: 88101 PM2.5 - Local Conditions 170
6: 88101 PM2.5 - Local Conditions 170
Method Description CBSA Code
<char> <int>
1: Met One BAM-1020 Mass Monitor w/VSCC 41860
2: Met One BAM-1020 Mass Monitor w/VSCC 41860
3: Met One BAM-1020 Mass Monitor w/VSCC 41860
4: Met One BAM-1020 Mass Monitor w/VSCC 41860
5: Met One BAM-1020 Mass Monitor w/VSCC 41860
6: Met One BAM-1020 Mass Monitor w/VSCC 41860
CBSA Name State FIPS Code State
<char> <int> <char>
1: San Francisco-Oakland-Hayward, CA 6 California
2: San Francisco-Oakland-Hayward, CA 6 California
3: San Francisco-Oakland-Hayward, CA 6 California
4: San Francisco-Oakland-Hayward, CA 6 California
5: San Francisco-Oakland-Hayward, CA 6 California
6: San Francisco-Oakland-Hayward, CA 6 California
County FIPS Code County Site Latitude Site Longitude
<int> <char> <num> <num>
1: 1 Alameda 37.68753 -121.7842
2: 1 Alameda 37.68753 -121.7842
3: 1 Alameda 37.68753 -121.7842
4: 1 Alameda 37.68753 -121.7842
5: 1 Alameda 37.68753 -121.7842
6: 1 Alameda 37.68753 -121.7842
tail(Ca2022)
Date Source Site ID POC Daily Mean PM2.5 Concentration Units
<char> <char> <int> <int> <num> <char>
1: 12/01/2022 AQS 61131003 1 3.4 ug/m3 LC
2: 12/07/2022 AQS 61131003 1 3.8 ug/m3 LC
3: 12/13/2022 AQS 61131003 1 6.0 ug/m3 LC
4: 12/19/2022 AQS 61131003 1 34.8 ug/m3 LC
5: 12/25/2022 AQS 61131003 1 23.2 ug/m3 LC
6: 12/31/2022 AQS 61131003 1 1.0 ug/m3 LC
Daily AQI Value Local Site Name Daily Obs Count Percent Complete
<int> <char> <int> <num>
1: 19 Woodland-Gibson Road 1 100
2: 21 Woodland-Gibson Road 1 100
3: 33 Woodland-Gibson Road 1 100
4: 99 Woodland-Gibson Road 1 100
5: 77 Woodland-Gibson Road 1 100
6: 6 Woodland-Gibson Road 1 100
AQS Parameter Code AQS Parameter Description Method Code
<int> <char> <int>
1: 88101 PM2.5 - Local Conditions 145
2: 88101 PM2.5 - Local Conditions 145
3: 88101 PM2.5 - Local Conditions 145
4: 88101 PM2.5 - Local Conditions 145
5: 88101 PM2.5 - Local Conditions 145
6: 88101 PM2.5 - Local Conditions 145
Method Description CBSA Code
<char> <int>
1: R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
2: R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
3: R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
4: R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
5: R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
6: R & P Model 2025 PM-2.5 Sequential Air Sampler w/VSCC 40900
CBSA Name State FIPS Code State
<char> <int> <char>
1: Sacramento--Roseville--Arden-Arcade, CA 6 California
2: Sacramento--Roseville--Arden-Arcade, CA 6 California
3: Sacramento--Roseville--Arden-Arcade, CA 6 California
4: Sacramento--Roseville--Arden-Arcade, CA 6 California
5: Sacramento--Roseville--Arden-Arcade, CA 6 California
6: Sacramento--Roseville--Arden-Arcade, CA 6 California
County FIPS Code County Site Latitude Site Longitude
<int> <char> <num> <num>
1: 113 Yolo 38.66121 -121.7327
2: 113 Yolo 38.66121 -121.7327
3: 113 Yolo 38.66121 -121.7327
4: 113 Yolo 38.66121 -121.7327
5: 113 Yolo 38.66121 -121.7327
6: 113 Yolo 38.66121 -121.7327
Look at the variables.
str(Ca2022)
Classes 'data.table' and 'data.frame': 59756 obs. of 22 variables:
$ Date : chr "01/01/2022" "01/02/2022" "01/03/2022" "01/04/2022" ...
$ Source : chr "AQS" "AQS" "AQS" "AQS" ...
$ Site ID : int 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 60010007 ...
$ POC : int 3 3 3 3 3 3 3 3 3 3 ...
$ Daily Mean PM2.5 Concentration: num 12.7 13.9 7.1 3.7 4.2 3.8 2.3 6.9 13.6 11.2 ...
$ Units : chr "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" "ug/m3 LC" ...
$ Daily AQI Value : int 58 60 39 21 23 21 13 38 59 55 ...
$ Local Site Name : chr "Livermore" "Livermore" "Livermore" "Livermore" ...
$ Daily Obs Count : int 1 1 1 1 1 1 1 1 1 1 ...
$ Percent Complete : num 100 100 100 100 100 100 100 100 100 100 ...
$ AQS Parameter Code : int 88101 88101 88101 88101 88101 88101 88101 88101 88101 88101 ...
$ AQS Parameter Description : chr "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" "PM2.5 - Local Conditions" ...
$ Method Code : int 170 170 170 170 170 170 170 170 170 170 ...
$ Method Description : chr "Met One BAM-1020 Mass Monitor w/VSCC" "Met One BAM-1020 Mass Monitor w/VSCC" "Met One BAM-1020 Mass Monitor w/VSCC" "Met One BAM-1020 Mass Monitor w/VSCC" ...
$ CBSA Code : int 41860 41860 41860 41860 41860 41860 41860 41860 41860 41860 ...
$ CBSA Name : chr "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" "San Francisco-Oakland-Hayward, CA" ...
$ State FIPS Code : int 6 6 6 6 6 6 6 6 6 6 ...
$ State : chr "California" "California" "California" "California" ...
$ County FIPS Code : int 1 1 1 1 1 1 1 1 1 1 ...
$ County : chr "Alameda" "Alameda" "Alameda" "Alameda" ...
$ Site Latitude : num 37.7 37.7 37.7 37.7 37.7 ...
$ Site Longitude : num -122 -122 -122 -122 -122 ...
- attr(*, ".internal.selfref")=<externalptr>
summary(Ca2022$`Daily Mean PM2.5 Concentration`)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-6.700 4.100 6.800 8.429 10.700 302.500
sum(is.na(Ca2022$`Daily Mean PM2.5 Concentration`))
[1] 0
sum(Ca2022$`Daily Mean PM2.5 Concentration`=="")
[1] 0
hist(Ca2022$`Daily Mean PM2.5 Concentration`)
Summary of the PM2.5 California 2022 Dataset
In the 2022 dataset of daily average PM2.5 concentrations at all sites in California, there are 59,756 rows and 22 columns. There are 59,756 observations and 22 variables. There are no missing values for the daily mean PM2.5 concentration. There are no observations that are labeled as NA, ““, 999, or 9999.
For daily mean PM2.5 concentrations, the range is -6.7 to 302.5 ug/m^3. Technically, the minimum concentration should be 0, since it is not possible to have a negative amount of particles in the air. However, according to the EPA, vaild negative numbers should be included in reporting to databases (https://www.epa.gov/sites/default/files/2016-10/documents/pm2.5_continuous_monitoring.pdf). The AQS generally allows negative data up to -10 ug/m^3. Therefore, I will leave the negative values in this database. The maximum is within the range of plausible values.
2. Combine the two years of data into one data frame, create date variable, and change the variable names.
Combine the two years of data into one data frame.
combined_ca <-rbind(Ca2002, Ca2022)
Use the Date variable to create a new column for year.
combined_ca$Date <-as.Date(combined_ca$Date, format ="%m/%d/%Y")combined_ca$Year <-format(combined_ca$Date, "%Y")
Change the names of the key variables so they are easier to refer to.
3. Create a basic map in leaflet() that shows the locations of the sites (make sure to use different colors for each year). Summarize the spatial distribution of the monitoring sites.
pm_stations <- (unique(combined_ca[,c("lat","lon", "Year", "Local Site Name")]))table(pm_stations$Year)
There are larger clusters of monitoring sites near Los Angeles, San Francisco, and Sacramento. It appears that there are more monitoring sites on the Western side of California compared to the Eastern side. The majority of the monitoring sites in the 2002 database were also listed in the 2022 database. The additional sites that were only in the 2022 database are scattered throughout California.
4. Check for any missing or implausible values of PM2.5 in the combined dataset. Explore the proportions of each and provide a summary of any temporal patterns you see in these observations.
Checking for any missing or implausible values in the combined dataset.
summary(combined_ca$PM2.5)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-6.70 4.50 7.60 10.05 12.20 302.50
Date Source Site ID POC PM2.5 Units Daily AQI Value
<Date> <char> <int> <int> <num> <char> <int>
1: 2022-09-20 AQS 60571001 5 -6.7 ug/m3 LC 0
2: 2022-09-19 AQS 60571001 5 -6.3 ug/m3 LC 0
3: 2022-09-21 AQS 60571001 5 -5.1 ug/m3 LC 0
4: 2022-09-03 AQS 60571001 5 -4.7 ug/m3 LC 0
5: 2022-09-22 AQS 60571001 5 -4.7 ug/m3 LC 0
6: 2022-09-04 AQS 60571001 5 -4.1 ug/m3 LC 0
Local Site Name Daily Obs Count Percent Complete AQS Parameter Code
<char> <int> <num> <int>
1: Truckee-Fire Station 1 100 88502
2: Truckee-Fire Station 1 100 88502
3: Truckee-Fire Station 1 100 88502
4: Truckee-Fire Station 1 100 88502
5: Truckee-Fire Station 1 100 88502
6: Truckee-Fire Station 1 100 88502
AQS Parameter Description Method Code Method Description
<char> <int> <char>
1: Acceptable PM2.5 AQI & Speciation Mass 733 Met-One BAM W/PM2.5 VSCC
2: Acceptable PM2.5 AQI & Speciation Mass 733 Met-One BAM W/PM2.5 VSCC
3: Acceptable PM2.5 AQI & Speciation Mass 733 Met-One BAM W/PM2.5 VSCC
4: Acceptable PM2.5 AQI & Speciation Mass 733 Met-One BAM W/PM2.5 VSCC
5: Acceptable PM2.5 AQI & Speciation Mass 733 Met-One BAM W/PM2.5 VSCC
6: Acceptable PM2.5 AQI & Speciation Mass 733 Met-One BAM W/PM2.5 VSCC
CBSA Code CBSA Name State FIPS Code State
<int> <char> <int> <char>
1: 46020 Truckee-Grass Valley, CA 6 California
2: 46020 Truckee-Grass Valley, CA 6 California
3: 46020 Truckee-Grass Valley, CA 6 California
4: 46020 Truckee-Grass Valley, CA 6 California
5: 46020 Truckee-Grass Valley, CA 6 California
6: 46020 Truckee-Grass Valley, CA 6 California
County FIPS Code County lat lon Year
<int> <char> <num> <num> <char>
1: 57 Nevada 39.32783 -120.1846 2022
2: 57 Nevada 39.32783 -120.1846 2022
3: 57 Nevada 39.32783 -120.1846 2022
4: 57 Nevada 39.32783 -120.1846 2022
5: 57 Nevada 39.32783 -120.1846 2022
6: 57 Nevada 39.32783 -120.1846 2022
tail(combined_ca)
Date Source Site ID POC PM2.5 Units Daily AQI Value
<Date> <char> <int> <int> <num> <char> <int>
1: 2022-09-10 AQS 60570005 3 218.2 ug/m3 LC 293
2: 2022-09-10 AQS 60610004 3 243.9 ug/m3 LC 338
3: 2022-08-15 AQS 61050002 1 244.7 ug/m3 LC 339
4: 2022-08-14 AQS 61050002 1 246.2 ug/m3 LC 342
5: 2022-09-16 AQS 60611004 3 296.3 ug/m3 LC 442
6: 2022-07-31 AQS 60932001 3 302.5 ug/m3 LC 454
Local Site Name Daily Obs Count Percent Complete
<char> <int> <num>
1: Grass Valley-Litton Building 1 100
2: Colfax-City Hall 1 100
3: Weaverville-Courthouse 1 100
4: Weaverville-Courthouse 1 100
5: Tahoe City-Fairway Drive 1 100
6: Yreka 1 100
AQS Parameter Code AQS Parameter Description Method Code
<int> <char> <int>
1: 88101 PM2.5 - Local Conditions 209
2: 88502 Acceptable PM2.5 AQI & Speciation Mass 731
3: 88502 Acceptable PM2.5 AQI & Speciation Mass 731
4: 88502 Acceptable PM2.5 AQI & Speciation Mass 731
5: 88502 Acceptable PM2.5 AQI & Speciation Mass 731
6: 88101 PM2.5 - Local Conditions 170
Method Description CBSA Code
<char> <int>
1: Met One BAM-1022 Mass Monitor w/ VSCC or TE-PM2.5C 46020
2: Met-One BAM-1020 W/PM2.5 SCC 40900
3: Met-One BAM-1020 W/PM2.5 SCC NA
4: Met-One BAM-1020 W/PM2.5 SCC NA
5: Met-One BAM-1020 W/PM2.5 SCC 40900
6: Met One BAM-1020 Mass Monitor w/VSCC NA
CBSA Name State FIPS Code State
<char> <int> <char>
1: Truckee-Grass Valley, CA 6 California
2: Sacramento--Roseville--Arden-Arcade, CA 6 California
3: 6 California
4: 6 California
5: Sacramento--Roseville--Arden-Arcade, CA 6 California
6: 6 California
County FIPS Code County lat lon Year
<int> <char> <num> <num> <char>
1: 57 Nevada 39.23348 -121.0556 2022
2: 61 Placer 39.10017 -120.9538 2022
3: 105 Trinity 40.73475 -122.9412 2022
4: 105 Trinity 40.73475 -122.9412 2022
5: 61 Placer 39.16602 -120.1488 2022
6: 93 Siskiyou 41.72689 -122.6336 2022
hist(combined_ca$PM2.5)
There are no missing values of PM2.5 in the combined dataset.
The range of daily average PM2.5 concentrations is -6.70 to 302.50. As mentioned above, technically, the minimum concentration should be 0, since it is not possible to have a negative amount of particles in the air. However, according to the EPA, vaild negative numbers should be included in reporting to databases. The AQS generally allows negative data up to -10 ug/m^3.
The max PM2.5 value is 302.5, which was recorded on 07/31/2022 in Yreka, Ca. This value seems plausible, as there was a large fire, the McKinney Fire, in Yreka on 07/31/2022.
Explore the proportions of missing values and implausible values, and provide a summary of any temporal patterns you see in these observations.
In this case, I am assuming that negative values are implausible.
mean(is.na(combined_ca$PM2.5))
[1] 0
The proportion of PM2.5 concentration values that are missing is 0%.
mean(combined_ca$PM2.5<0, na.rm =TRUE)
[1] 0.002838958
The proportion of PM2.5 concentration values less than 0 is 0.28%. This is a very low percentage, and I am not certain these values are implausible. Therefore, I will leave them in the dataset.
library(ggplot2)combined_ca[combined_ca$Year ==2002, ] |>ggplot()+geom_point(mapping =aes(x = Date, y = PM2.5))+labs(x ="Date", y ="Daily Average PM2.5 Concentration (ug/m^3)", title ="Daily Average PM2.5 Concentrations in California, 2002")
combined_ca[combined_ca$Year ==2022, ] |>ggplot()+geom_point(mapping =aes(x = Date, y = PM2.5))+labs(x ="Date", y ="Daily Average PM2.5 Concentration (ug/m^3)", title ="Daily Average PM2.5 Concentrations in California, 2022")
These scatterplots above demonstrate that there are daily average PM2.5 concentration values recorded for every day of the year in both 2002 and 2022.
combined_ca |>filter(PM2.5<0) |>ggplot()+geom_point(mapping =aes(x = Date, y = PM2.5))+labs(x ="Date", y ="Daily Average PM2.5 Concentration (ug/m^3)", title ="Negative Daily Average PM2.5 Concentrations in California, 2002 and 2022")
This scatterplot allows us to better visualize when negative daily average PM2.5 concentrations were recorded during 2002 and 2022. There were no negative PM2.5 concentrations recorded for 2002. In 2022, there were negative PM2.5 values recorded throughout the year, but the largest negative PM2.5 concentrations were recorded between September-October 2022.
5. Explore the main question of interest at three different spatial levels. Create exploratory plots (e.g. boxplots, histograms, line plots) and summary statistics that best suit each level of data. Write up explanations of what you observe in these data.
The daily average PM2.5 concentrations for all sites in California were averaged to generate a daily average PM2.5 concentration for California for each day of the year. This dataset was then used to generate the following graphs.
ggplot(combined_ca_avg)+geom_boxplot(mapping =aes(x = Year, y = PM2.5_avg, fill = Year))+labs(x ="Year", y ="Daily Average PM2.5 Concentration (ug/m^3)", title ="Daily Average PM2.5 Concentrations for California, 2002 vs 2022")
Overall, the daily average PM2.5 concentrations of California were lower in 2022 compared to 2002. The median daily average PM2.5 concentration of California was approximately 16 ug/m^3 in 2002 and 8 ug/m^3 in 2022. The maximum mean daily PM2.5 concentration, excluding outliers, was approximately 36 ug/m^3 in 2002 and 15 ug/m^3 in 2022. The highest outlier was approximately 50 ug/m^3 in 2002 and 20 ug/m^3 in 2022.
ggplot(combined_ca_avg)+geom_histogram(mapping =aes(x = PM2.5_avg, fill = Year), color ="dimgrey", binwidth =2, position ="identity", alpha =0.6)+labs(x ="Daily Average PM2.5 Concentration (ug/m^3)", y ="Number of Days", title ="Daily Average PM2.5 Concentrations for California, 2002 vs 2022")
Based on this histogram, it appears that the daily average PM2.5 concentrations for California have decreased from 2002 to 2022. The distribution of daily average PM2.5 concentrations for California in 2002 was right-skewed with a peak at 14 ug/m^3. The distribution of daily average PM2.5 concentrations for California in 2022 was slightly right-skewed with a peak at 8 ug/m^3. The range of daily average PM2.5 concentrations was approximately 4-51 ug/m^3 in 2002 and 3-19 ug/m^3 in 2022.
ggplot(data = combined_ca_avg |>mutate(Date =as.Date(format(Date, "2000-%m-%d"))))+geom_line(mapping =aes(x = Date, y = PM2.5_avg, color = Year))+scale_x_date(date_breaks ="1 month", date_labels ="%b")+labs(x ="Month", y ="Daily Average PM2.5 Concentration (ug/m^3)", title ="Daily Average PM2.5 Concentrations for California, 2002 vs 2022")
The daily average PM2.5 concentrations for California were generally lower in 2022 compared to 2002 for all months of the year. In 2002, the daily average PM2.5 concentrations ranged from approximately 5 ug/m^3 to 50 ug/m^3. In 2022, the daily average PM2.5 concentrations ranged from approximately 3 ug/m^3 to 19 ug/m^3. In 2002, the daily average PM2.5 concentrations for California were highest in November-December. In 2022, the daily average PM2.5 concentrations for California were highest in September.
Summary statistics of PM2.5 concentration, by year, across all sites in California
Year Count Mean Median Min Max SD
1 2022 59756 8.428595 6.8 -6.7 302.5 7.644274
2 2002 15976 16.115943 12.0 0.0 104.3 13.867372
These statistics were generated from a dataset containing the daily average PM2.5 concentrations for all sites in California from 2002 and 2022. It appears that the daily concentrations of PM2.5 have decreased in California from 2002 to 2022. The median daily average PM2.5 concentration across all sites in California was 12 ug/m^3 in 2002 and 6.8 ug/m^3 in 2022. The maximum daily average PM2.5 concentration was 104.3 ug/m^3 in 2002 and 302.5 ug/m^3 in 2022. While the maximum daily average PM2.5 concentration was greater in 2022, the majority of daily average PM2.5 concentration values are lower in 2022 compared to 2002.
The daily average PM2.5 concentrations for all sites in Los Angeles County (LAC) were averaged to generate a daily average PM2.5 concentration for LAC for each day of the year. This dataset was then used to generate the following graphs.
ggplot(combined_LAC_avg)+geom_boxplot(mapping =aes(x = Year, y = PM2.5_avg, fill = Year))+labs(x ="Year", y ="Daily Average PM2.5 Concentration (ug/m^3)", title ="Daily Average PM2.5 Concentrations for Los Angeles County, 2002 vs 2022")
Generally, the daily average PM2.5 concentrations for Los Angeles County (LAC) were lower in 2022 compared to 2002. The median daily average PM2.5 concentration for LAC was approximately 18 ug/m^3 in 2002 and 11 ug/m^3 in 2022. The maximum daily average PM2.5 concentration, excluding outliers, for LAC was approximately 43 ug/m^3 in 2002 and 20 ug/m^3 in 2022. The interquartile range was narrower in 2022 compared to 2002.
ggplot(combined_LAC_avg)+geom_histogram(mapping =aes(x = PM2.5_avg, fill = Year), color ="dimgrey", binwidth =2, position ="identity", alpha =0.6)+labs(x ="Daily Average PM2.5 Concentration (ug/m&3)", y ="Number of Days", title ="Daily Average PM2.5 Concentrations for Los Angeles County, 2002 vs 2022")
The daily average PM2.5 concentrations for Los Angeles County (LAC) were generally lower in 2022 compared to 2002. The distribution of daily average PM2.5 concentrations for LAC in 2002 was right-skewed with a peak at approximately 16 ug/m^3 and a second peak at 23 ug/m^3. The distribution of daily average PM2.5 concentrations for LAC in 2022 was slightly right-skewed distribution with a peak at 11 ug/m^3.
ggplot(data = combined_LAC_avg |>mutate(Date =as.Date(format(Date, "2000-%m-%d"))))+geom_line(mapping =aes(x = Date, y = PM2.5_avg, color = Year))+scale_x_date(date_breaks ="1 month", date_labels ="%b")+labs(x ="Date", y ="Daily Average PM2.5 Concentration (ug/m^3)", title ="Daily Average PM2.5 Concentrations for Los Angeles County, 2002 vs 2022")
In general, the daily average PM2.5 concentrations of Los Angeles County were lower in 2022 compared to 2002 for all months of the year. The difference in PM2.5 concentrations was greatest for the months of October and December when comparing 2002 to 2022. The range of daily average PM2.5 concentrations was approximately 5-58 ug/m^3 in 2002 and 3-26 ug/m^3 in 2022.
Year Count Mean Median Min Max SD
1 2022 5070 10.97164 10.3 -1.2 56.0 5.238462
2 2002 1879 19.65604 17.4 0.6 72.4 11.884042
These statistics were generated from a dataset containing the daily average PM2.5 concentrations for all sites in Los Angeles County from 2002 and 2022. The daily average PM2.5 concentrations in Los Angeles County were lower in 2022 compared to 2002. The median daily average PM2.5 concentration across all sites in LAC was 17.4 ug/m^3 in 2002 and 10.3 ug/m^3 in 2022. The maximum daily average PM2.5 concentration was 72.4 ug/m^3 in 2002 and 56 ug/m^3.
Site in Los Angeles: Pasadena
combined_pas <- combined_ca[combined_ca$`Local Site Name`=="Pasadena", ]
ggplot(combined_pas)+geom_boxplot(mapping =aes(x = Year, y = PM2.5, fill = Year))+labs(x ="Year", y ="Daily Average PM2.5 Concentration (ug/m^3)", title ="Daily Average PM2.5 Concentrations for Pasadena, CA, 2002 vs 2022")
Overall, the daily average PM2.5 concentrations for the Pasadena site are lower in 2022 compared to 2002. The median daily average PM2.5 concentration was approximately 18 ug/m^3 in 2002 and 8 ug/m^3 in 2022. The maximum daily average PM2.5 concentration, excluding outliers, was approximately 45 ug/m^3 in 2002 and 19 ug/m^3 in 2022.
ggplot(combined_pas)+geom_histogram(mapping =aes(x = PM2.5, fill = Year), color ="dimgrey", binwidth =2, position ="identity", alpha =0.6)+labs(x ="Daily Average PM2.5 Concentration (ug/m^3)", y ="Number of Days", title ="Daily Average PM2.5 Concentrations for Pasadena, CA, 2002 vs 2022")
Overall, the daily average PM2.5 concentrations at the Pasadena site were lower in 2022 compared to 2002. The distribution of the daily average PM2.5 concentrations for 2002 is right-skewed with a long right tail and peak at approximately 12 ug/m^3. The distribution of the daily average PM2.5 concentrations for 2022 is slightly right-skewed with a peak at 6 ug/m^3. Based on this graph, the range was approximately 3-59 ug/m^3 in 2002 and 3-23 ug/m^3 in 2022.
ggplot(data = combined_pas |>mutate(Date =as.Date(format(Date, "2000-%m-%d"))))+geom_line(mapping =aes(x = Date, y = PM2.5, color = Year))+scale_x_date(date_breaks ="1 month", date_labels ="%b")+labs(x ="Date", y ="Daily Average PM2.5 Concentration (ug/m^3)", title ="Daily Average PM2.5 Concentrations for Pasadena, CA, 2002 vs 2022")
Generally, the daily average PM2.5 concentrations were lower in 2022 compared to 2002 for all months of the year. Based on this graph, the range of daily average PM2.5 concentration values was approximately 4-58 ug/m^3 in 2002 and 4-22 ug/m^3 in 2022. The differences in PM2.5 concentrations were smallest during the months of May and June.
Year Count Mean Median Min Max SD
1 2022 120 9.094167 7.9 3.5 22.1 3.679726
2 2002 121 20.290909 17.8 4.0 57.8 11.143085
The daily average PM2.5 concentrations at the Pasadena, CA site were lower in 2022 compared to 2002. The median daily average PM2.5 concentration was 17.8 ug/m^3 in 2002 and 7.9 ug/m^3 in 2022. The maximum daily average PM2.5 concentration was 57.8 ug/m^3 in 2002 and 22.1 ug/m^3 in 2022.